Our situated environment is full of uncertainty and highly dynamic, thus hindering the widespread adoption of machine-led Intelligent Decision-Making (IDM) in real world scenarios. This means IDM should have the capability of continuously learning new skills and efficiently generalizing across wider applications. IDM benefits from any new approaches and theoretical breakthroughs that exhibit Artificial General Intelligence (AGI) breaking the barriers between tasks and applications. Recent research has well-examined neural architecture, Transformer, as a backbone foundation model and its generalization to various tasks, including computer vision, natural language processing, and reinforcement learning. We therefore argue that a foundation decision model (FDM) can be established by formulating various decision-making tasks as a sequence decoding task using the Transformer architecture; this would be a promising solution to advance the applications of IDM in more complex real world tasks. In this paper, we elaborate on how a foundation decision model improves the efficiency and generalization of IDM. We also discuss potential applications of a FDM in multi-agent game AI, production scheduling, and robotics tasks. Finally, through a case study, we demonstrate our realization of the FDM, DigitalBrain (DB1) with 1.2 billion parameters, which achieves human-level performance over 453 tasks, including text generation, images caption, video games playing, robotic control, and traveling salesman problems. As a foundation decision model, DB1 would be a baby step towards more autonomous and efficient real world IDM applications.
translated by 谷歌翻译
An approach to evolutionary ensemble learning for classification is proposed in which boosting is used to construct a stack of programs. Each application of boosting identifies a single champion and a residual dataset, i.e. the training records that thus far were not correctly classified. The next program is only trained against the residual, with the process iterating until some maximum ensemble size or no further residual remains. Training against a residual dataset actively reduces the cost of training. Deploying the ensemble as a stack also means that only one classifier might be necessary to make a prediction, so improving interpretability. Benchmarking studies are conducted to illustrate competitiveness with the prediction accuracy of current state-of-the-art evolutionary ensemble learning algorithms, while providing solutions that are orders of magnitude simpler. Further benchmarking with a high cardinality dataset indicates that the proposed method is also more accurate and efficient than XGBoost.
translated by 谷歌翻译
基于深度学习的视网膜病变分割方法通常需要大量精确的像素注释数据。但是,概述病变区域的圆形或椭圆等粗糙注释的效率可能是像素级注释的六倍。因此,本文提出了一个注释细化网络,以将粗注释转换为像素级分割掩码。我们的主要新颖性是原型学习范式的应用来增强不同数据集或类型病变的概括能力。我们还引入了一个原型称量模块,以处理过度较小的病变的具有挑战性的病例。提出的方法对公开可用的IDRID数据集进行了培训,然后概括为公共DDR和我们的现实世界私人数据集。实验表明,我们的方法显着改善了初始的粗蒙版,并以较大的边缘优于非概率基线。此外,我们证明了原型称量模块在跨数据库和跨阶级设置中的实用性。
translated by 谷歌翻译
近年来目睹了采用灵活的机械学习模型进行乐器变量(IV)回归的兴趣,但仍然缺乏不确定性量化方法的发展。在这项工作中,我们为IV次数回归提出了一种新的Quasi-Bayesian程序,建立了最近开发的核化IV模型和IV回归的双/极小配方。我们通过在$ l_2 $和sobolev规范中建立最低限度的最佳收缩率,并讨论可信球的常见有效性来分析所提出的方法的频繁行为。我们进一步推出了一种可扩展的推理算法,可以扩展到与宽神经网络模型一起工作。实证评价表明,我们的方法对复杂的高维问题产生了丰富的不确定性估计。
translated by 谷歌翻译
在共享潜在空间中对齐两个或更多个分布的无监督任务具有许多应用,包括公平表示,批量效果缓解和无监督域适应。现有的基于流动的方法独立估计多个流动,这相当于学习多个完整的生成模型。其他方法需要对抗性学习,这可以是可以计算地昂贵和挑战的优化。因此,我们的目标是在避免对抗性学习的同时联合对齐多个分布。通过从最佳运输(OT)理论的高效对准算法的启发,我们开发了一种简单的迭代方法来构建深层和富有效力的流动。我们的方法将每次迭代分成两个子问题:1)形成分配分配的变化近似,并且2)通过基于已知的OT结果的闭合形式可逆对准映射最小化该变分近似。我们的经验结果证明了这种迭代算法以低计算成本实现了竞争分布对准,同时能够自然地处理两个以上的分布。
translated by 谷歌翻译
We present Second Thought, a new learning paradigm that enables language models (LMs) to re-align with human values. By modeling the chain-of-edits between value-unaligned and value-aligned text, with LM fine-tuning and additional refinement through reinforcement learning, Second Thought not only achieves superior performance in three value alignment benchmark datasets but also shows strong human-value transfer learning ability in few-shot scenarios. The generated editing steps also offer better interpretability and ease for interactive error correction. Extensive human evaluations further confirm its effectiveness.
translated by 谷歌翻译
This work studies training one-hidden-layer overparameterized ReLU networks via gradient descent in the neural tangent kernel (NTK) regime, where, differently from the previous works, the networks' biases are trainable and are initialized to some constant rather than zero. The first set of results of this work characterize the convergence of the network's gradient descent dynamics. Surprisingly, it is shown that the network after sparsification can achieve as fast convergence as the original network. The contribution over previous work is that not only the bias is allowed to be updated by gradient descent under our setting but also a finer analysis is given such that the required width to ensure the network's closeness to its NTK is improved. Secondly, the networks' generalization bound after training is provided. A width-sparsity dependence is presented which yields sparsity-dependent localized Rademacher complexity and a generalization bound matching previous analysis (up to logarithmic factors). As a by-product, if the bias initialization is chosen to be zero, the width requirement improves the previous bound for the shallow networks' generalization. Lastly, since the generalization bound has dependence on the smallest eigenvalue of the limiting NTK and the bounds from previous works yield vacuous generalization, this work further studies the least eigenvalue of the limiting NTK. Surprisingly, while it is not shown that trainable biases are necessary, trainable bias helps to identify a nice data-dependent region where a much finer analysis of the NTK's smallest eigenvalue can be conducted, which leads to a much sharper lower bound than the previously known worst-case bound and, consequently, a non-vacuous generalization bound.
translated by 谷歌翻译
Recent years have witnessed the tremendous progress of 3D GANs for generating view-consistent radiance fields with photo-realism. Yet, high-quality generation of human radiance fields remains challenging, partially due to the limited human-related priors adopted in existing methods. We present HumanGen, a novel 3D human generation scheme with detailed geometry and $\text{360}^{\circ}$ realistic free-view rendering. It explicitly marries the 3D human generation with various priors from the 2D generator and 3D reconstructor of humans through the design of "anchor image". We introduce a hybrid feature representation using the anchor image to bridge the latent space of HumanGen with the existing 2D generator. We then adopt a pronged design to disentangle the generation of geometry and appearance. With the aid of the anchor image, we adapt a 3D reconstructor for fine-grained details synthesis and propose a two-stage blending scheme to boost appearance generation. Extensive experiments demonstrate our effectiveness for state-of-the-art 3D human generation regarding geometry details, texture quality, and free-view performance. Notably, HumanGen can also incorporate various off-the-shelf 2D latent editing methods, seamlessly lifting them into 3D.
translated by 谷歌翻译
Privacy noise may negate the benefits of using adaptive optimizers in differentially private model training. Prior works typically address this issue by using auxiliary information (e.g., public data) to boost the effectiveness of adaptive optimization. In this work, we explore techniques to estimate and efficiently adapt to gradient geometry in private adaptive optimization without auxiliary data. Motivated by the observation that adaptive methods can tolerate stale preconditioners, we propose differentially private adaptive training with delayed preconditioners (DP^2), a simple method that constructs delayed but less noisy preconditioners to better realize the benefits of adaptivity. Theoretically, we provide convergence guarantees for our method for both convex and non-convex problems, and analyze trade-offs between delay and privacy noise reduction. Empirically, we explore DP^2 across several real-world datasets, demonstrating that it can improve convergence speed by as much as 4x relative to non-adaptive baselines and match the performance of state-of-the-art optimization methods that require auxiliary data.
translated by 谷歌翻译
System auditing has emerged as a key approach for monitoring system call events and investigating sophisticated attacks. Based on the collected audit logs, research has proposed to search for attack patterns or track the causal dependencies of system events to reveal the attack sequence. However, existing approaches either cannot reveal long-range attack sequences or suffer from the dependency explosion problem due to a lack of focus on attack-relevant parts, and thus are insufficient for investigating complex attacks. To bridge the gap, we propose Zebra, a system that synergistically integrates attack pattern search and causal dependency tracking for efficient attack investigation. With Zebra, security analysts can alternate between search and tracking to reveal the entire attack sequence in a progressive, user-guided manner, while mitigating the dependency explosion problem by prioritizing the attack-relevant parts. To enable this, Zebra provides (1) an expressive and concise domain-specific language, Tstl, for performing various types of search and tracking analyses, and (2) an optimized language execution engine for efficient execution over a big amount of auditing data. Evaluations on a broad set of attack cases demonstrate the effectiveness of Zebra in facilitating a timely attack investigation.
translated by 谷歌翻译